AITopics | dropout 0

In this appendix, we first introduce the datasets and evaluation metrics used in the experiments in Section A. Then, we provide extra experimental results in Section B. In Section C, we present details of network design, training scheme, and hyper-parameter tuning. We conduct experiments on 11 popular time series datasets: (1) Electricity Transformer Temperature [42] (ETTh(1,2),ETTm1) 3consists of 2 year electric power data collected from two separated counties of China. Each data point includes an "oil temperature" value and 6 power load features. The data is aggregated into 5-minutes windows, resulting in 12 points per hour and 288 points per day. A.1 Electricity Transformer Temperature (ETT) For data pre-processing, we perform zero-mean normalization, i.e., X We use Mean Absolute Errors (MAE) [17] and Mean Squared Errors (MSE) [26] for model comparison.

artificial intelligence, dataset, machine learning, (19 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.29)

Industry:

Energy > Power Industry (1.00)
Energy > Renewable > Solar (0.33)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

23ee05bf1f4ade71c0f8f5ca722df601-Supplemental-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsApr-25-2026, 02:24:23 GMT

artificial intelligence, engcn, machine learning, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Supplementary Material for P-Flow

Neural Information Processing SystemsFeb-17-2026, 18:42:28 GMT

The link to our demo page is https://bit.ly/3ID5Zam. We present the objective metrics according to the Euler steps in the result section of the main paper. We measure the acoustic quality using 5-scale Mean Opinion Scores (MOS).

artificial intelligence, natural language, representation, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language (0.32)

Add feedback

df4f371f1f89ec8ba5014b3310578048-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-12-2026, 09:32:04 GMT

computational linguistic, hyperparameter, sentencepiece, (7 more...)

Neural Information Processing Systems

Country: Europe > Belgium > Brussels-Capital Region > Brussels (0.06)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Appendix

Neural Information Processing SystemsFeb-11-2026, 20:40:53 GMT

Weheldoutavalidation setfromthetraining set,andusedthisvalidation settoselecttheL2 regularization hyperparameter,which weselected from 45logarithmically spaced values between 10 6 and 105, applied to the sum of the per-example losses. Because the optimization problem is convex, we used the previous weights as a warm start as we increased theL2 regularization hyperparameter. Wemeasured eithertop-1ormean per-class accuracy, depending on which was suggested by the dataset creators. A.3 Fine-tuning In our fine-tuning experiments in Table 2, we used standard ImageNet-style data augmentationand trained for 20,000 steps with SGD with momentum of0.9 and cosine annealing [ 20]without restarts. Each curve represents a different model.

accuracy, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.66)

Add feedback

SupplementaryMaterialsFor: " DomainAdaptation with InvariantRepresentationLearning: What TransformationstoLearn? "

Neural Information Processing SystemsFeb-11-2026, 06:36:57 GMT

Furthermore, letφ: X Z be an encoder s.t. Then, there is no functionφ s.t. Let there be a subset in the invariant spaceB Z, and suppose that we have marginal invariance inthelatent space:PS(φ(X) B) = PT(φ(X) B), B. Define thepre-image ofB as: A={a X:φ(a) B}. Let A X be a region s.t. We followed the procedure in [2], and used a mixture kernel function ofq RBF kernels: κ(z1,z2) = Pq i=1ηiexp{ ||z1 z2||2}/σ2i, where σ2i is the kernel width of the i-th kernel, and ηi is a mixing weight which we set to1/q.

adaptation, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
Oceania > Australia > New South Wales > Sydney (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.49)

Add feedback

c2c2a04512b35d13102459f8784f1a2d-Supplemental.pdf

Neural Information Processing SystemsFeb-11-2026, 01:01:03 GMT

The tasks is to determine if the sentence has positive or negativesentiment. The task is to determine whether a given sentence is linguistically acceptableornot. RTE: Recognizing Textual Entailment [2, 10, 21, 17] contains 2.5K train examples from textual entailment challenges. Thefine-tuning costsare the same with BERT plus relativepositiveencodings as the same Transformer model is used.

artificial intelligence, natural language, weightdecay 0, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language (0.72)

Add feedback